Skip to content

Better SQL for to-one joins#37819

Open
roji wants to merge 2 commits intodotnet:mainfrom
roji:SplitQueryReferenceJoin
Open

Better SQL for to-one joins#37819
roji wants to merge 2 commits intodotnet:mainfrom
roji:SplitQueryReferenceJoin

Conversation

@roji
Copy link
Member

@roji roji commented Mar 1, 2026

  • Stop adding to-one-joined entity keys to query identifiers
  • Prune unneeded to-one JOINs

Closes #29182
Closes #29662

@roji roji force-pushed the SplitQueryReferenceJoin branch from 184531e to eeaf668 Compare March 2, 2026 08:19
* Stop adding to-one-joined entity keys to query identifiers
* Prune unneeded to-one JOINs

Closes dotnet#29182
@roji roji force-pushed the SplitQueryReferenceJoin branch from eeaf668 to 4a3ef03 Compare March 2, 2026 09:03
@roji roji marked this pull request as ready for review March 2, 2026 16:14
@roji roji requested a review from a team as a code owner March 2, 2026 16:14
Copilot AI review requested due to automatic review settings March 2, 2026 16:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves query SQL generation by avoiding redundant identifier expansion and enabling pruning of unnecessary to-one reference joins, especially benefiting split queries (issue #29182).

Changes:

  • Treat single-result/to-one joins as not increasing cardinality, so inner identifiers aren’t added to the outer query identifier.
  • Detect to-one joins via join predicates and mark corresponding LEFT JOINs as prunable.
  • Update many SQL assertion baselines across SqlServer/Sqlite tests to reflect fewer JOINs, projections, and ORDER BY columns.

Reviewed changes

Copilot reviewed 60 out of 60 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/EFCore.Relational/Query/SqlExpressions/SelectExpression.cs Adds to-one join awareness to identifier propagation and marks certain LEFT JOINs as prunable; introduces predicate-based to-one detection.
test/EFCore.Sqlite.FunctionalTests/BulkUpdates/NorthwindBulkUpdatesSqliteTest.cs Updates SQL baselines to remove now-pruned LEFT JOIN subqueries/parameters.
test/EFCore.Sqlite.FunctionalTests/BulkUpdates/NonSharedModelBulkUpdatesSqliteTest.cs Updates SQL baseline to remove redundant LEFT JOIN.
test/EFCore.SqlServer.FunctionalTests/BulkUpdates/NorthwindBulkUpdatesSqlServerTest.cs Updates SQL baselines to remove now-pruned LEFT JOIN subqueries/parameters.
test/EFCore.SqlServer.FunctionalTests/BulkUpdates/NonSharedModelBulkUpdatesSqlServerTest.cs Updates SQL baseline to remove redundant LEFT JOIN.
test/EFCore.SqlServer.FunctionalTests/Query/*.cs (multiple) Updates many SQL baselines (fewer projected key columns, fewer JOINs, simplified ORDER BY).

@roji roji linked an issue Mar 2, 2026 that may be closed by this pull request
@stevendarby
Copy link
Contributor

stevendarby commented Mar 2, 2026

@roji Firstly thanks for tackling this problem which seems to improve a lot of queries!

I have potentially found an issue when a one-to-one is involved. Firstly, depending on which side is declared as the principal and which one you start with as the root of your query, you may either get an INNER JOIN which is never pruned, or, when it's flipped around, I think the identifier matching is not working so it's also not pruned. This is a bit vague from me and I'll try to get a repro/unit test to show it clearly soon, just noting now in case it gives you enough to go on.

@stevendarby
Copy link
Contributor

stevendarby commented Mar 3, 2026

@roji

I put the two tests collapsed at the bottom of the post in AdHocMiscellaneousQuerySqlServerTest. The two contexts they use differ in where the FK for the Blog-Author relationship sits.

SplitTestA sql is:

-- SplitTestContextA

SELECT [b].[Id], [b].[AuthorId], [a].[Id]
FROM [Blogs] AS [b]
INNER JOIN [Author] AS [a] ON [b].[AuthorId] = [a].[Id]
ORDER BY [b].[Id]

-- SplitTestContextA

SELECT [p].[Id], [p].[BlogId], [b].[Id]
FROM [Blogs] AS [b]
INNER JOIN [Author] AS [a] ON [b].[AuthorId] = [a].[Id]
INNER JOIN [Post] AS [p] ON [b].[Id] = [p].[BlogId]
ORDER BY [b].[Id]

The inner join to Author is not pruned. I don't think it's needed, and it's 'fixed' by also passing prunable: isToOneJoin for inner joins, but I do not know the full ramifications of doing that, and maybe it can't be pruned.

SplitTestB sql is:

-- SplitTestContextB

SELECT [b].[Id], [a].[Id], [a].[BlogId]
FROM [Blogs] AS [b]
LEFT JOIN [Author] AS [a] ON [b].[Id] = [a].[BlogId]
ORDER BY [b].[Id], [a].[Id]

-- SplitTestContextB

SELECT [p].[Id], [p].[BlogId], [b].[Id], [a].[Id]
FROM [Blogs] AS [b]
LEFT JOIN [Author] AS [a] ON [b].[Id] = [a].[BlogId]
INNER JOIN [Post] AS [p] ON [b].[Id] = [p].[BlogId]
ORDER BY [b].[Id], [a].[Id]

Here, the left join to Author is not pruned, and I think it's related to the identifier matching process not expecting the one-to-one to be sided this way.

Obviously, if it's simply that this enhancement doesn't or can't cover one-to-one cases, that's fine and it's still an improvement. I've not found any actual breakages yet.

Details
    [ConditionalFact]
    public async Task SplitTestA()
    {
        var contextFactory = await InitializeAsync<SplitTestContextA>(seed: a => a.SeedAsync());
        using var context = contextFactory.CreateContext();

        _ = context
            .Set<SplitTestContextA.Blog>()
            .Include(x => x.Author)
            .Include(x => x.Posts)
            .AsSplitQuery()
            .TagWith(nameof(SplitTestContextA))
            .ToList();

        var sql = TestSqlLoggerFactory.Sql;
    }

    [ConditionalFact]
    public async Task SplitTestB()
    {
        var contextFactory = await InitializeAsync<SplitTestContextB>(seed: a => a.SeedAsync());
        using var context = contextFactory.CreateContext();

        _ = context
            .Set<SplitTestContextB.Blog>()
            .Include(x => x.Author)
            .Include(x => x.Posts)
            .AsSplitQuery()
            .TagWith(nameof(SplitTestContextB))
            .ToList();

        var sql = TestSqlLoggerFactory.Sql;
    }

    protected class SplitTestContextA(DbContextOptions options) : DbContext(options)
    {
        public DbSet<Blog> Blogs { get; set; }

        public class Blog
        {
            public int Id { get; set; }
            public int AuthorId { get; set; }
            public Author Author { get; set; }
            public ICollection<Post> Posts { get; set; }
        }

        public class Author
        {
            public int Id { get; set; }
            public Blog Blog { get; set; }
        }

        public class Post
        {
            public int Id { get; set; }
            public Blog Blog { get; set; }
        }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            modelBuilder.Entity<Blog>()
                .HasOne(b => b.Author)
                .WithOne(a => a.Blog)
                .HasForeignKey<Blog>(a => a.AuthorId);
        }

        public async Task SeedAsync()
        {
            Add(new Blog
            {
                Author = new Author(),
                Posts = [new Post()]
            });

            await SaveChangesAsync();
        }
    }

    protected class SplitTestContextB(DbContextOptions options) : DbContext(options)
    {
        public DbSet<Blog> Blogs { get; set; }

        public class Blog
        {
            public int Id { get; set; }
            public Author Author { get; set; }
            public ICollection<Post> Posts { get; set; }
        }

        public class Author
        {
            public int Id { get; set; }
            public int BlogId { get; set; }
            public Blog Blog { get; set; }
        }

        public class Post
        {
            public int Id { get; set; }
            public Blog Blog { get; set; }
        }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            modelBuilder.Entity<Blog>()
                .HasOne(b => b.Author)
                .WithOne(a => a.Blog)
                .HasForeignKey<Author>(a => a.BlogId);
        }

        public async Task SeedAsync()
        {
            Add(new Blog
            {
                Author = new Author(),
                Posts = [new Post()]
            });

            await SaveChangesAsync();
        }
    }

@roji
Copy link
Member Author

roji commented Mar 3, 2026

@stevendarby thanks for all of the above, and for checking the change - much appreciated!

Yes, you're right about INNER JOIN not getting pruned.

  • The problem with INNER JOIN is that contrary to LEFT JOIN, is also has a filtering function (when no correlated rows are found on the inner side). In other words, while it's always safe to prune, pruning (the wrong) INNER JOIN could cause rows to be returned which shouldn't be.
  • Now, EF generates INNER JOIN when a required one-to-one navigation is accessed; since the navigation is required, we can assume that a correlated row does exist in the database, and so the INNER JOIN can be pruned in that particular case.
  • However, INNER JOIN can also be produced as a result of a user-specified LINQ Join() operator. In this case, the user has explicitly signaled that they want to filter, and we can't assume (unlike with required navigations) that correlated rows necessarily exist in the database.
  • Now, EF (currently) expands navigations early, in preprocessing (nav expansion); this means that by the time we get to the translation phase (where all this is happening), we've lost the distinction between these two cases: the INNER JOIN might represent a required navigation access, or a user-specified LINQ operator. As a result, we can't reliably apply pruning here, given the current architecture.

Next steps:

  • This is yet another shortcoming of EF's current nav expansion approach - it loses information early in the pipeline #32957. There's already quite a few problems with the current architecture, and getting rid of it is something I've already started looking at for 11.
  • A completely different perspective on this is whether EF should be generating INNER JOIN for required navigations, as opposed to LEFT JOIN; since we assume that the database corresponds to the user's model, we can correctly generate either one - the user guaranteed that there will always be a matching row (see #29645). However, LEFT JOINs in general can be less performant than INNER JOINs for various reasons, so EF's current translation is likely the better one.
  • So I think the right forward here is to merge this PR more or less in its current state, without pruning away INNER JOINs; then, if and when I rearchitect the pipeline to get rid of preprocessing nav expansion, we can revisit this as we'll have the full knowledge to accurately prune the INNER JOIN only when it's not needed.

How does that all sound?

PS even though this PR doesn't remove the INNER JOIN, it does remove the identifiers from the ORDER BY (and also from the projection); that's also a significant improvement, even if not the full optimization as for non-required navigations.

@stevendarby
Copy link
Contributor

@roji thanks for the explanation, that all sounds good to me. I could have missed it but I don't think you covered why the left join isn't pruned from SplitTestB and if that's something that should be addressed or not, but not something to hold this up over.

@@ -634,7 +634,7 @@ LEFT JOIN (
FROM [EntityCompositeKeyEntityTwo] FOR SYSTEM_TIME AS OF '2010-01-01T00:00:00.0000000' AS [e3]
INNER JOIN [EntityCompositeKeys] FOR SYSTEM_TIME AS OF '2010-01-01T00:00:00.0000000' AS [e4] ON [e3].[CompositeKeySkipSharedKey1] = [e4].[Key1] AND [e3].[CompositeKeySkipSharedKey2] = [e4].[Key2] AND [e3].[CompositeKeySkipSharedKey3] = [e4].[Key3]
) AS [s1] ON [e].[Id] = [s1].[TwoSkipSharedId]
ORDER BY [e].[Id], [s].[ThreeId], [s].[TwoId], [s].[Id], [s0].[SelfSkipSharedLeftId], [s0].[SelfSkipSharedRightId], [s0].[Id], [s1].[TwoSkipSharedId], [s1].[CompositeKeySkipSharedKey1], [s1].[CompositeKeySkipSharedKey2], [s1].[CompositeKeySkipSharedKey3], [s1].[Key1], [s1].[Key2]
ORDER BY [e].[Id], [s].[ThreeId], [s].[TwoId], [s0].[SelfSkipSharedLeftId], [s0].[SelfSkipSharedRightId], [s1].[TwoSkipSharedId], [s1].[CompositeKeySkipSharedKey1], [s1].[CompositeKeySkipSharedKey2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hoped to get my head around this more before commenting, but may not have time. Bit unsure of a few cases like this where keys are partially pruned from the order by, e.g. why just CompositeKeySkipSharedKey3 pruned? May be fine! In mind is a possible interaction with that mechanism that removes the last column from the order by when collections joins are involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider removing reference navigation keys from order by Remove reference joins in split queries

4 participants